Approximate policy iteration using regularised Bellman residuals minimisation

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Approximate policy iteration using regularised Bellman residuals minimisation

In this paper we present an Approximate Policy Iteration (API) method called API−BRM using a very effective implementation of incremental Support Vector Regression (SVR) to approximate the value function able to generalize in continuous (or large) space Reinforcement Learning (RL) problems. RL ia a methodology able to solve complex and uncertain decision problem usually modeled as Markov Decisi...

متن کامل

Approximate Policy Iteration using Large-Margin Classifiers

We present an approximate policy iteration algorithm that uses rollouts to estimate the value of each action under a given policy in a subset of states and a classifier to generalize and learn the improved policy over the entire state space. Using a multiclass support vector machine as the classifier, we obtained successful results on the inverted pendulum and the bicycle balancing and riding d...

متن کامل

Approximate Modified Policy Iteration

Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are exten...

متن کامل

Error Bounds for Approximate Policy Iteration

In Dynamic Programming, convergence of algorithms such as Value Iteration or Policy Iteration results -in discounted problemsfrom a contraction property of the back-up operator, guaranteeing convergence to its fixedpoint. When approximation is considered, known results in Approximate Policy Iteration provide bounds on the closeness to optimality of the approximate value function obtained by suc...

متن کامل

Approximate Policy Iteration with Demonstration Data

We propose an algorithm to solve uncertain sequential decision-making problems that utilizes two different types of data sources. The first is the data available in the conventional reinforcement learning setup: an agent interacts with the environment and receives a sequence of state transition samples alongside the corresponding reward signal. The second data source, which differentiates the s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Experimental & Theoretical Artificial Intelligence

سال: 2015

ISSN: 0952-813X,1362-3079

DOI: 10.1080/0952813x.2015.1024494